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An  Invariant  Measure  Approach  to  the 
Convergence  of  Stochastic  Approximations  with  State  Dependent  Noise 


Harold  J.  Kushner  and  Adam  Shwartz 


Abstract 


A  new  method  is  presented  for  quickly  getting  the 


■9PE  ^ordi 


ordinary  differen¬ 


tial  equation)  associated  with  the  asymptotic  properties  of  the  stochastic 

A 


approxi mat i on  X 


(or  the  projected  algorithm  for  the 


constrained  problem).  Either  a  -*■(),  or  a  can  be  constant,  in  which  case 

n  n 

the  analysis  is  on  the  sequence  obtained  when  a  -*•  0.)  The  method  basically  requires 

that  £X— rg-  -t)  be  Markov  with  a  "Fellez**zransition  function,  but  little  else. 

fV.  ,  *fe  ) 

The  simplest  result  requires  that  if  the  corresponding  noise  process 

.  {^n(x). ,  n  have  a  unique  invariant  measure;  but  the  'non-unique'  case  can 

also  be  treated.  No  mixing  condition  is  required,  nor  the  construction  of  averaged 

test  functions,  and  need  not  be  continuous,  "7h  detailed  analysis  of  the 


way  that  {£n>  varies  with  is  not  required.  For  the  class  of  sequences 

treated,  the  conditions  seem  easier  to  verify  than  for  other  methods.  There  are 
extensions  to  the  non-Markov  case.  Two  examples  illustrate  the  power  and  ease  of 
use  of  the  approach.  Aside  from  the  advantages  of  the  method  in  treating  standard 
problems,  it  seems  to  be  particularly  useful  for  handling  the  type  of  iterative 
algorithms  which  arise  in  adaptive  communication  theory,  where  the  dynamics  are 
often  discontinuous  and  the  'noise'  is  often  state  dependent  due  to  the  effects 
of  feedback. _ 
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I.  Introduction 


We  consider  stochastic  approximations  of  the  form 


(1.1) 


X  =  X  +  a  f (X  , C  ) » 
n+i  n  n  n  n 


where  f(-,-)  might  be  discontinuous,  and  the  evolution  of  (C  )  depends  on 

n 

{Xn>  in  the  sense  that,  in  general, 

PUn+1  e  AUt*  ii"}^  p^n+i  £  AlXi,£i>  1  -  n}* 


We  also  treat  the  following  "projected"  version  of  (1.1).  Let  G  be  a  bounded 
set  of  the  form  G  =  (x  :  q^(x)  _<  0,  i  =  1,  ...  ,s),  where  q^(*)  are  continu¬ 
ously  ^-fferentiable,  and  G  is  the  closure  of  its  interior.  Let  w  (y)  denote 

G 

any  closest  point  in  G  to  y.  Then  the  projected  algorithm  is 

(1-2)  Vi  '  V*.  *  ■. 


Several  so-called  ordinary  differential  equations  (ODE)  methods  for  proving 
convergence  of  (Xn)  have  been  developed  in  recent  years.  ([1]  to  [4],  and  [5], 
a  more  polished  form  of  [4],  with  weaker  conditions).  The  aim  of  these  methods 
is  to  get  an  ODE,  which  we  write  symbolically  (for  (1.1))  as 

(1.3)  x  =  Exf(x,5)  h  | f(x,C)Px(dO 

wherefloosely  speaking)  Px(*)  is  the  stationary  distribution  of  the  sequence 

(K  ,  when  X  =  x.  The  idea  is  that  (X  }  in  (1.1)  varies  much  more  slowly 
n  n  n 

(for  large  n)  than  {5^}  does  and  that  some  sort  of  averaging  method  or  law 


of  large  numbers  can  be  used  to  show  that  the  asymptotic  properties  of  (x^) 
are  the  same  as  those  of  (1.3),  with  a  proper  definition  of  Px(*)* 

The  methods  in  [1]  to  [3]  are  very  useful,  but  are  often  difficult  to 
apply  when  the  noise  is  state  dependent,  in  the  sense  that  the  conditions  are 
often  either  hard  to  verify  or  do  not  hold  in  many  important  cases  of  interest. 
Reference  [4],  [5]  presented  an  "averaging  method"  which  works  quite  well  for 
such  problems,  although  one  would  like  to  avoid  the  work  associated  with  con¬ 
structing  the  "averaged  test  functions",  and  verifying  the  conditions  on  them. 
The  results  in  [4],  [5]  were  for  w.  p.  1.  convergence  and  also  proved  stability 
and  similar  properties  for  {X^}  sequences  which  were  not  artificially  bounded. 
But  generally,  past  methods  required  what  is  often  a  difficult  analysis  of 
the  way  {?n }  depends  on  {X^> .  ~  ' 

In  this  paper,  the  essential  assumption  for  the  validity  of  (1.3)  is  that 

{£  }  depends  on  (X  }  in  such  a  way  that  if  X  =  x,  a  constant,  then  the 
n  n  n 

corresponding  (C^)  process  possesses  a  unique  stationary  measure.  Such  an 
assumption,  either  implicitly  or  explicitly,  was  used  in  much  past  work  on 
the  'state-dependent'  noise  case.  If  the  stationary  measures  are  not  unique, 
then  a  very  similar  result  (2.9)  holds.  The  conditions  required  here  are 
generally  weaker  and  much  easier  to  check  and  are  useful  even  when  the  noise 
does  not  depend  on  the  state.  As  amply  shown  by  the  examples,  the  method  is 
easy  to  use.  The  techniques  used  are  new  for  the  class  of  problems  treated. 

We  concentrate  on  the  case  a  -*■  0.  The  same  proofs  work  (even  more 
easily)  when  an  =  a,  a  constant.  Then,  we  get  that  the  limit  (as  a  -*■  0) 

A  A 

of  x  (•)  satisfies  (1.3)  or  (2.12)  (in  the  constrained  case),  where  x  (•) 
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is  the  piecewise  linear  function  with  values  at  na.  The  approach  is 

advantageous  for  treating  many  standard  problems  because  the  conditions  are 

relatively  easy  to  verify.  They  are  particularly  useful  for  treating  the  type 

of  algorithms  which  appear  in  adaptive  communication  theory,  where  the  dynamics 

are  often  discontinuous  and  the  noise  is  often  state  dependent  owing  to  the  role 

of  feedback.  In  such  cases,  one  normally  has  a  =  a. 

n 

In  Section  2,  we  discuss  the  case  where  {X^.C^  n  >_  1}  is  a  Markov 
process,  or  where  the  'state-noise'  pair  can  be  ' Mark oviani zed ' .  This  is  the 
case  which  is  most  fully  understood  and  easiest  to  use.  A  class  of  non-Markov 
processes  is  dealt  with  in  Section  3,  and  in  Section  4  we  illustrate  the  power 
and  ease  of  use  of  the  method  via  two  examples. 

2.  Limit  Theorems  with  (x  ,  £  ,  )  Markov. 

-  n  n-1  - 

In  this  section,  we  are  concerned  with  the  Markov  case.  In  very  many 
applications  the  system  is  either  Markovian  or  the  actual  physical  noise  can 
be  Markovianized,  perhaps  leading  to  an  abstract  valued  process.  Below,  it 
is  assumed  that  {X^}  is  either  tight  or  lies  in  a  compact  set.  This  is  not 
a  very  serious  restriction,  since  practical  algorithms  tend  to  use  various 

truncation  devices.  In  any  case,  the  use  of  the  projection  method  (1.2) 
guarantees  the  compactness  when  Xn  lies  in  a  Euclidean  space. 

In  Theorems  1  and  2,  we  allow  X^  to  take  values  in  a  compact  metric 
space,  and  £n  in  a  complete  separable  metric  space.  The  reason  for  this  is 
that  it  fits  certairi  'abstract'  applications  where  the  metric  is  defined  by  a 
weak  topology,  and  which  will  be  published  separately.  Also,  it  facilitates 


-■■  r  *■ 
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the  treatment  of  non-Markovian  problems  by  'Markovianizing'  in  an  abstract  space.  ’ 

Some  assumptions.  Assumptions  { A2.4,6 )  and  the  first  part  of  (A2.1)  will  be 
weakened  later,  as  will  the  explicit  form  given  for  f 

A2.1.  {Xn’5n  n  2L  0}  is  a  Markov  process  with  a  (possibly  non- 
homogeneous)  transition  function  P(x,£,n,fc,A)  =  P{fX  )A,  V«-l>  €  A'Xn  =  ‘-Vl  *  »• 
The  Xn  takes  values  in  a  compact  subset  H  of  a  metric  and  linear  topological 
space,  and  £n  takes  values  in  S,  a  complete  separable  metric  space.  Both  metrics 
are  assumed  to  be  invariant. 

A2.2.  {£  }  is  tight  in  S. 

n  - 2 - 

A2.3  For  each  Borel  B  c  S,  define  the  one  step  'transition  function' 
PX(£>1»B)  =  P(?n  €  B|£n_j  =  £,  X^  =  x},  and  suppose  that  it  does  not  depend  on 
n.  Let  Px(£,l,‘)  be  weakly^  continuous  in  (x,£). 


For  each  x,  we  now  define  a  Markov  process  (£^(x),  n  >  0}  via  the 
transition  function  Pf£,£,*),  where  PV(£,£,B)  =  f  PfS,£-k,d£ ')P  (£',k,B) 

X  X  J  X  X 

is  defined  recursively. 

A2.4.  For  each  x  £  H,  let  {£  (x) ,  n  >  0}  have  a  unique  invariant 
-  -  n  —  - - 

measure  Px(0,  and  let  (Px(*)«  x  £  H}  be  tight  • 

*2-5-  l  lvr*„l  <  “ .  o  <  »„ » o.  l  \  -  - 

A2.6,  f(*)  is  bounded 

A2.7.  There  is  an  integer  c  ^  0  such  that  Jpx(£,c+l,d£')f (x,£')  is 

continuous  in  (x,£).  It  equals  lim  |P(x,£,j-c,c,dx'  d£')P„, (£' ,l,d£")f (x' ,£")  = 

j  )  x 

lim  E[f (Xj ,£..) |Xj_c=x,£j  c  ^=£],  where  the  limit  is  uniform  on  compact  (x,£)  sets. 


I.e. , 

continuous. 


|Px(£,l,d£' 


)g(£)  is  continuous  in  (x,£)  if  g(-)  is  bounded  and 


tt 


Normally,  the  tightness  holds  when  (A2.2)  holds,  so  the  condition  is  not 
restrictive. 
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Remark  on  (A2 . 7) .  if  f(.,.)  is  continuous,  then  (A2.3)  implies  that  we  can 

take  c  =  0.  If  c  =  0,  then  the  second  sentence  of  (A2.7)  is  redundant.  Even 
if  f(-,.)  is  not  conti c  =  0  is  o^ten  enough  to  get  the  required  smooth¬ 
ing.  See.  for  example,  the  applications  in  Section  4.  Even  if  c  >  0  is  needed, 
the  second  sentence  of  (A2.7)  does  not  seem  to  be  particularly  restrictive,  since 

| X . -X .  j  -*■  0  as  j  -*■  °°  implies  that  the  measure  in  the  lim  expression  is  essen- 
1  j 

tially  Px(£,c+1,-)  for  large  j.  The  assumption  is  stated  as  it  is  for  technical 
reasons.  In  all  applications  that  we  are  aware  of  now,  if  the  first  sentence  of 
(A2.7)  holds,  so  does  the  second  sentence. 

Before  introducing  the  next  assumption,  some  additional  notation  is 
n-1 


required.  Set  t  =  £  a.  ,  and  m(t)  =  max  {n:t  <_  t},  for  t  >_  0.  Thus 

n  0  1  n 

m(t  )  =  n.  Let  0  <  6_  -*  0  as  n  ->■  “  such  that  lim  sup  {a.:  j  >  n}/6  -  0. 

n  n  n  J  n 

For  each  n  choose  a  sequence  {m(£,n),  1*1,...}  where  m(n,i)  =  n, 

m(n,£+l) -1 

m(n,£+l)  >  m(n,£),  and  such  that  \  a.  =  6  ,  modulo  an  'end'  value  of 

m(n,£)  J  n 

V  1111,5  ^tm(n,£+l)*tmCn,£)^^n  1  as  n  *  uniformly  in  £.  For 

notational  convenience  we  henceforth  suppress  the  n  _in  m(£,n)  and  write 
simply  m(n,£)  =  m^.  Let  I ^ (C)  denote  the  indicator  of  the  set  where  l  €  K. 


For  each  <»>,  £,  n  define  the  measure  on  the  Borel  sets  of  S: 

,  Vl'1 

Q(o,,£,n,«)  =  j-  la  £  -|X 

«  n.  J  J  9  9 


Define  Q„(w,i,  n ,  •)  =  Q(u>,£,n, • ) I„(C  ,)•  Thus,  if  £  _,(w)  £  K,  the  measure  is 

o  In  "  a 

the  zero  measure.  If  S  is  not  compact,  then  another  assumption  is  needed. 

First,  we  state  it  (A2.8b)  and  then  discuss  it.  Either  (A2.8a)  or  (A2.8b)  will 
be  used.  (A2.8b)  always  holds  for  N(K)  *  1  if  S  is  compact. 

A2.8a.  Either  S  is  compact  or  {£n>  is  mutually  independent. 
or 

A2.8b.  For  each  compact  K,  there  is  an  integer  N(K)  <  »  such 
that  for  each  T  the  set 
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'  P(X  (w)>C  ,  ("'; 

ml  V1 


that 


"i  ,  i-~!  ,  fwl  €  K) *.  all  compact  K,  all  n,£,i.  ), 

,  ,  r.£.^  -  — • 

j  >  r,^  >  a,  j-m?  >_  N(K),  t  .  -  tn  <_  T> 


is  tight. 


Despite  its  seemingly  complicated  structure,  (A2.8b)  is  quite  natural 
and  is  often  easy  to  verify.  See,  for  example,  the  application  in  Section  4b, 
and  the  example  below.  It  is  motivated  by  the  following  consideration.  If  S 
is  not  compact,  it  is  possible  that 

(*)  {PUj-l  €  •iXm1(m)'SVl(u,)}'  j-Vw} 


is  not  tight.  Suppose  for  example  that  is  a  stationary  scalar  valued 

Gauss  Markov  process,  not  depending  on  {X^},  and  whose  correlation  function 
p(’)  tends  to  zero.  Then  the  set  (*)  is  not  tight,  since  arbitrarily  large 
initial  conditions  £  . (u)  are  allowed.  But,  if  the  £  , O)  were  all 

V1  "V1 

confined  to  a  bounded  set,  then  (*)  would  be  tight.  As  K  increases,  and 
£  .  (w)  €  K,  it  might  take  longer  for  the  'future'  £.(i>m„)  to  ’settle  down’.  This 

V1  1  -  i 

is  why  we  allow  N(K)  steps  fbrthis  ’settling  down’,  where  N(K)  increases  with  K.  In  thi 
example,  if  K  =  (£:|£|  £  k},  then  any  N(K)  satisfying  p(N(K))-k  <_  constant 
is  satisfactory. 

We  now  take  some  notation  from  [2],  Let  x^(-)  denote  the  piecewise 

linear  interpolation  of  the  function  with  value  X  at  t  .  Define  the  shifted 

n  n 

function  xn(0  by  xn(t)  =  x°(t+t  ),  t  >  0.  Thus  xn(D)  =  X  ,  and  the 

n  -  n 

asymptotic  properties  (as  t  -*■  ®)  of  any  limit  (as  n -►  «>)  of  { xn ( * )  >  yields 
the  asymptotic  behavior  of  (Xn>.  The  convergence  of  xn(-)  to  a  limit  x(>) 
is  in  the  sense  of  weak  convergence  of  a  sequence  of  probability  measures.  We 
give  the  differential  equations  which  x(-)  satisfies.  Using  this  differential 
equation  and  the  properties  of  weak  convergence,  one  can  analyze  the  asymptotic 
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behavior  of  X  as  in  [2].  See  that  reference  for  details.  Here,  we  con- 
n 

centrate  or  proving  representations  for  the  differential  equations. 

Theorem  1.  Under  (A2.1)  to  (A2.8),  { x11  C * ) J  is  tight  and  any  weak  limit 

x(0  satisfies 

(2.1)  A  =  EXf(x,C)  =  |  f(x,£)PX(d£)  w.p.l, 

where  x(0)  £  H.  The  right  hand  side  of  (2.1)  is  continuous  in  x. 

Proof.  For  notational  simplicity,  we  do  the  proof  only  for  the  case  where  H 

j* 

is  a  subset  of  a  Euclidean  space  E  .  The  details  in  the  general  case  are 

quite  similar.  What  is  actually  proved  in  the  general  case  is  that  for  each 

bounded  real  valued  g(0  whose  first  two  Frechet  derivatives  are  continuous, 

g (x(t))  -  g(x(0))  =  fg  (x(u))oEx(u)f(x(u),£)du  w.p.l. 

■'0 

The  continuity  is  a  consequence  of  the  tightness  and  uniqueness  (A2.4). 
Now,  by  the  tightness  of  (5n)»  we  can  choose  6^  -*■  0  and  non-decreas¬ 
ing  compact  such  that 


i#i &•!?****. 
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the  break  points  of  xn(-)  and  that  xn(0)  =  x^(tn)  =  ^n.)  By  (A2.6), 

{xn(-)»  n  >_  0>  is  tight  in  C2r[0,»).  Henceforth,  we  work  with  a  weakly 

convergent  subsequence  (called  subsequence  1),  also  indexed  by  n,  and  with 
limit  (x(0,F(0)»  or  with  a  subsequence  of  it.  We  use  Skorokhod  imbedding 
([7],  Theorem  3.1.1)  wherever  convenient,  and  with  no  notational  change.  Thus, 
we  can  assume  where  convenient,  that  w.p.l.  (xn( •) ,F  ( •))  converges  to 
(x(0*F(*))  uniformly  on  bounded  time  intervals.  Suppose  that 

(2.2)  F(t)  =  ftEx(u)f(x(u),5)du 

J0 

and  that  for  arbitrary  k,  t,  s  and  s^  <  s2‘ ■ •  <s^<t<t+s  and  bounded 
and  continuous  h(-). 


(2.3)  Eh(x(Sj) ,F(Sj) , j  ^  k)[x(t+s)-x(t)-F(t+s)-F(t))]  =  0. 

Then  M(t)  =  x(t)  -  x(0)  -  F(t)  is  a  continuous  martingale  with  M(0)  =  0. 
Since  the  quadratic  variation  of  M(-)  is  zero  (as  can  readily  be  shown), 

M(t)  =  0  w.p.l,  and  (2.1)  holds.  So,  we  only  need  to  prove  (2.2)  and  (2.3). 
For  smooth  h(*), 

m(tn+t+s)-l 

(2.4)  Eh(xn(s.),F  (s  ),j  <  k)[xn(t+s)-xn(t)  -  l  a  ft^Jl  5  t 

J  n  J  m(t  +t)  J  J  J 

v  n 

(2.5)  Eh(xn(s.),Fn(sj),j  <  k)[xn(t+s)-xn(t)  - J*  fn(s)ds]  =  e^, 

where  e  and  e'  go  to  zero  as  n  . 
n  n 

We  now  prove 

f  (s)  -*•  Ex^f(x(s) ,  £)  in  probability  for  each  s. 


(2.6) 


n 
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The  limit  (2.6)  implies  that  F  (•)  converges  in  measure  to  the  right  side  of  (2.2). 
This  and  the  weak  convergence  and  (2.5)  yield  (2.3)  with  F(-)  defined  by  (2.2). 

So,  only  (2.6)  needs  to  be  proved. 

Fix  s  and  {m„}  such  that+  s  €  Tt  -t  ,t  , -t  ).  Let  N_  denote 
-  -  i.  -  1  m.  n  m„+l  n  0 


the  null  set  on  which  (x  ( - ) , ( - ) )  does  not  converge  uniformly  to  (X(')>F(*)) 

on  bounded  intervals  (under  the  Skorokhod  imbedding) .  It  is  enough  to  show  that 

(for  each  s)  each  subsequence  of  subsequence  1  contains  a  further  subsequence 

for  which  the  limit  in  (2.6)  holds  in  probability.  Select  a  subsequence  of 

tt 

subsequence  1,  indexed  also  by  n  but  called  subsequence  2,  such  that 

P{£  .  €  K  ,  all  large  n)  =  1.  Let  N(s)  denote  the  exceptional  null  set. 

m£-l  n 

Fix  u  £  Nq  u  N(s),  and  extract  a  weakly  convergent  subsequence  (a  subsequence 

of  subsequence  2)  of  the  set  of  measures  (tight  by  (A2.8)  and  the  properties 

of  6n)  Q  =  (Qk  (u,£,n,»):  n  €  subsequence  2,  s  fixed  as  above},  with 
_  n 

limit  P^(*)*  The  limits  in  (2.7)  below  are  on  this  subsequence.  Let  g(-)  be 
bounded  and  continuous  and  set  G(x,£)  =  Jpx(5,l,d? ')g(£ ') .  Then  by  (A2. 3,5,8) 

,  -1 

fc+1  a.  , 

m.  ,  -1 
*.+1  a.  f 


(2.7) 


=  lim  |qk  (u),£,n,dC)G(Xm  ,?•) 
n  *  n 


^(dOGfxfs),^)  =  |paj(dC)PJC(CM,<i5)g(0- 


I.e.,  for  each  n,  choose  m  =  m(l,n)  such  that  s  is  in  the  indicated  interval. 
Keep  in  mind  that  i  depends  on  n,  and  that  we  suppress  the  n-dependence  of 
m(t,n)  in  the  notation. 


tt 


By  the  tightness  of  {£j},  we  can  always  choose  such  a  subsequence. 


.  ,n  edw  -w 
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=  lim 
n 


(2.8) 


Similarly,  the  limit  in  the  first  line  is  P  (dC)g(C).  In  going  from  the 

J  u 

second  to  the  third  line  of  (2.7),  we  used  the  facts  that  G C * , C * )  is  con¬ 
tinuous,  uniformly  on  compact  £',  that  sup  | X . -X  |  -*•  0  as  n  -*■  «>, 

"i.i  i  5 


that  X  -*■  x(s)  and  the  tightness  of  the  set  Q. 
m£ 

Due  to  the  arbitrariness  of  g(-)>  and  the  uniqueness  (A2.4)  and  to 

the  equality  of  the  last  line  of  (2.7)  with  jp^( dC)g(C),  we  have  P^C*)  = 
x  f  sl 

P  v  (•)•  Again,  by  the  uniqueness,  the  limit  does  not  depend  on  the  chosen 

xf  si 

subsequence  of  Q.  Thus,  if  to  (  N  U  N(s),  Q  (w,£,n,*)  -*•  P  1  ■'(•)  weakly 

n 

as  n  -*■  ®,  where  n  now  indexes  the  second  chosen  'subsequence  2'  of  the 
theorem. 

Using  (A2.7)  we  now  have  (limits  are  on  the  'subsequence  2') 

"W1  a.  , 

iim  l  J-  P(X  ,  €  «l,  j-t  ,  d£',dx)P  (C’,l,d5)f(x,OI,  (€-  ,) 

n  m  6n  J  m£  V1  11  1  X  n  V1 

£ 

Vl_1a.  f 

l  -r1  P(Xm  ,C  ,m  ,j-«  -c,d5’,dx')P(x',?',j-c,c,d5,dx)Pxtt,l,de'')f(x.S'')IK  Un  _j) 
m.  n  *  m£  m£  n  £ 


✓ 


“z+l'1  a.  t 

lim  l  |  p (X  »^ni^-l ,m£  ’^_m£’C’  d^' dx^Px  K  ' >  c+l,d£)f  (x  ,  C)IK  (C^.j) 


n  m^  n 
m„ .  ,-l 


n  £ 


lim^J  -/  f  p(Xm  .rm  .j-m-c.de*)^  (C  i)[Px  <5  '  .c.*l,d«)f(X  .O] 

n  mf  n  •<  £  £  n£m  £ 


lim  f  Qk  (u,£,n,dt')Pxfs-j(C,,c+l,dC)f(x(s),C) 
n  *  n  1  ' 

|  F,(d6')Px(s)  (*’  iC+l,dC)f(x(s)  ,a 


J  Px(s)(d£)f(x(s),£). 
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In  going  from  the  3rd  to  the  4th  and  then  to  the  5th  line  we  used  the  continuity 

in  (x,£)  of  jpx(C' ,c+l,d£)f (x' ,C)  and  the  facts  concerning  convergence  cited 

below  (2.7).  In  going  from  the  next  to  last  step  to  the  last  step,  we  used  the 
—  xfs) 

fact  that  P^t*)  =  p  (•)  is  an  invariant  measure  for  the  transition  function 

Px(s)(S>j»’)  for  each  x(s).  The  equality  of  the  first  and  last  lines  of  (2.8) 
for  u  (.  Nq  U  N(s)  yields  the  desired  result  (2.6)  for  'subsequence  2'.  But, 
since  each  subsequence  of  subsequence  1  contains  a  further  subsequence  satisfy¬ 
ing  the  requirements  of  subsequence  2  (but  with  perhaps  a  different  N(s))  (2.6) 
holds  for  subsequence  1  also.  Furthermore,  the  limits  for  all  possible  sub¬ 
sequence  l's  differ  only  in  the  initial  condition  x(0).  Q.E.D. 


Extension.  In  many  problems  of  interest  (see  Section  4),  the  algorithm 

(l.l)  takes  the  form  X  =  X  +  a  f  (w) ,  where 

n+i  n  n  n 


ECfn(«.)|xi,ei_1.i  in]  wvi). 


and  Fn(x,£)  -*■  F(x,£),  a  continuous  function,  uniformly  in  (x,£)  on 
compact  sets.  Then  Theorem  1  still  holds.  This  extension  is  useful  when 
fn( ■ )  depends  on  variables  other  than  (Xn,£n) ;  for  example,  it  might  depend 
on  a  'choice'  or  'logical'  variable  Zn»  where  PC2^l=l | ^ ^  n'  * 
q(Xn,Cn_j),  for  some  continuous  function  q(*)- 

Non  unique  Px(>)*  A  very  similar  result  to  Theorem  1  can  be  obtained 
when  the  uniqueness  in  (A2.4)  is  dropped.  Let  -  (Px(*)>  «  €  some  set 
A(x) }  denote  the  set  of  invariant  measures  for  the  transition  function 
PxU,j,-).  Assume  that  x  €  any  compact  set)  is  tight.  For  each  x, 


•*- 


-12- 


is  convex  and  weakly  compact.  Define  the  set 

C(x)  =  {y:y  =  Jpx(dC)f(x,£) ,  a  £  A(x)}. 

Then  C(x)  is  closed  and  convex.  The  sets  and  C(x)  are  upper  semi- 

continuous  in  x  in  the  Hausdorff  topology  (with  the  metrized  weak  topology 
on  the  space  of  distributions  and  the  metric  topology  on  H) .  This  is  a 
consequence  of  the  fact  that  under  the  tightness  of  ^  if  x  -*•  x 

JC  X  X 

and  P  n(*)  €^n,  then  (P  n(-)i  is  tight,  and  all  weak  limits  are  in  &x  by  (A2.3). 

Theorem  2.  Assume  (A2. 1) - (A2. 8) ,  with  (A2.4)  altered  as  above,  then 
(xn(0)  is  tight  and  any  weak  limit  x(-)  satisfies 

(2.9)  x  €  C(x)  for  almost  all  w,t. 

Remarks  on  the  proof.  The  proof  is  essentially  the  same  as  that  of  Theorem  1, 

and  we  only  remark  on  a  couple  of  points.  By  the  argument  of  Theorem  1,  if 

w  t  Nq  U  N(s)  is  fixed  and  n  indexes  a  weakly  convergent  subsequence  of 

the  set  of  measures  ~Q~  defined  above  (2.7),  then  we  must  have 

m.  ,  -1 
J.+  1  a. 

(2.10)  lim  T  -J- 

n  n 

l 

=  lim  fn(s)  =  jf(x(s),c)Px(s)(dO  €  C(x(s) ) , 

for  some  a  €  A(x(s)),  perhaps  depending  on  us  and  s  and  on  the  selected  sub¬ 
sequence.  Under  the  weak  convergence  for  the  selected  subsequence  and  the 
Skorokhod  imbedding,  Fn(’)  -*•  F(0  (which  is  absolutely  continuous)  uniformly 
on  bounded  time  intervals  w.p.l,  but  f^fs)  does  not  necessarily  converge  in 
probability  to  f(s)  «  F(s)  as  it  did  in  Theorem  1.  But,  for  each  fixed 


[P(X„ 


’m  -l,aV 


j-mJl,dx,,d5')f(x',c,)IK  Um  j) 
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(o  %  N0  and  each  T  <  <*>,  Fr ( - )  (considered  as  a  function  on  [0,T])  converges 

(along  any  'subsequence  1')  to  f(*)  weakly  when  these  functions  are  considered 

as  elements  of  Lj[0,T].  Thus  for  each  to  ?  there  are  (B  i  £,  n}  such 

n  n 

that  0  <  B  .,  T  6  -  =  1,  3  •  -  0  as  n  -►  «  for  each  i,  and  \  B  ,  f.  (•)**■  f(*) 
—  ni  ni  ni  m  i 

in  the  norm  of  L^fO.T].  This  convergence,  together  with  the  limit  (2.10),  the 

convexity  and  closure  of  C(x)  and  the  upper  semi -continuity  cited  above 

Theorem  2  imply  that  f(s)  £  C(x(s))  for  almost  all  (<o,s). 


The  projection  algorithm  (1.2). 

Recall  the  definition  of  from  Section  1.  Let  w(h(-))  denote 

the  (not  necessarily  unique)  projection  of  the  vector  field  h(-)  onto  G; 
i.e. , 

(2.11)  n(h(x))  =  set  of  limits  lim  [itr(x+Ah(x))-x]/A. 

/HO  b 


Theorem  3.  Assume  (1.2)  and  the  conditions  above  it  instead  of  (1.1), 
and  assume  (A2.1)  t£  (A2.8),  except  that  Xn  takes  values  in  a  Euclidean 
space .  Then  {xn(*)>  is  tight  and  if  x(-)  is  the  limit  of  a  weakly  convergent 
subsequence ,  x(*)  satisfies  the  'projected'  equation 

(2.12)  x  =  iT(Exf(x,£))  for  almost  all  u,t. 

Recall  the  extension  of  Theorem  1  to  the  algorithm  X  ,  =  X  ♦  a  f  (w) ,  cited 

n+l  n  n  n 

after  Theorem  1.  If  ir„(X  +a  f  (u>))  is  used,  then  Theorem  3  holds  with 

0  n  n  n 
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Remarks  on  the  proof.  The  proof  is  quite  similar  to  that  of  Theorem  1,  and  only 

a  few  remarks  will  be  made.  Use  the  partition  of  [2,  eqn.  (5.3.4)]  to  write  (1.2) 
in  the  form 


(2.13) 


n+1 


=  X 


*  + 


a  d  . 
n  n' 


where  Vn  =  Wanfl CXn’Cn))  "  (Vanfl ’  In[2]*  Vn  is  termed 

t  .  Using  the  notation  of  Theorem  1,  define  the  piecewise  constant  function 
n 


dn(to,t)  by 


■  i  l  ‘i  ds 

rt  ~ 


Vr1 


on  ft  -t  ,t  -t  ) , 
l  n  m*+l  n 


as  in  Theorem  1,  {x  (■) »Fn(0 »Dn(‘) J 


and  set  D  (ui,t)  =  [  d  (u,s)ds.  Then, 
n  Jo  n 

is  tight.  Extract  a  convergent  subsequence  with  limit  (x(0 ,F(.) ,D(*)) •  Both 
F(-)  and  D(-)  are  absolutely  continuous  and  F(0  satisfies  (2.2).  Write 
f ( - )  »  F(< )  and  define  d(*)  by 


D(t) 


■E 


d(s)ds. 


Write 


f(t)  =  IT  (f  (t) )  ♦  f(t),  f(t)  =  f(t)  -  H(f(t)), 


the  f ( •)  term  being  a  'projection  error*.  By  the  method  of  Theorem  1, 


(2.14) 


x(t)  -  x(0)  -  F (t)  -  D(t)  =  0  w.p.l. 

x  =  EXf (x,C)  +  d  =  f  ♦  d, 
x(t)  =  T(f(t))  ♦  f(t)  ♦  d(t)  ,  W.p.l. 


Using  the  ideas  of  [2,  Section  5.3],  it  can  be  shown  that  -  f(s)ds  =  D(t). 

■'0 

This  and  (2.14)  imply  (2.12).  We  omit  the  rest  of  the  details.  We  note  only 
that  the  proof  of  the  last  equality  uses  the  facts  that  if  X^+j  €  3G,  then  dn 
is  in  the  cone  -K(Xti+1),  and  that  if  x(t)  €  3G,  then  f(t)  in  the  cone 
K(x(t) ) ,  where 

K(x)  »  (y:  y  =  £  A.q.  v(x),  for  some  set  of  A.  >0). 

i:q.(x)=0  1  1*x  3 

Unbounded  f(-) . 

We  will  use  • 

(A2.9)  There  are  a  K  <  “>  and  a  positive  valued  function  d(-)  such  that 

|f(x,£)l  5  K/l+d(0),  and  x  takes  values  in  the  Euclidean  space  Rr.  For  some 

1+n  1 

o  >  0,  sup  E|d(£.) | 1  <  «.  I 

j  3 

! 

i 

i  I 


It  ■ 

4 

b 

t 


TTieorem^.  Under  (A2.9),  and  the  tightness  of  (Xn),  both  (xn(*)) 
and  {Fn(>)}  are  tight  in  Cr[0,«). 

Proof.  Both  xn(*)  and  Fn(*)  are  sums  of  terms  0f  the  types  a..f(X.,e.j)  and 

a.E  f(X.,£.),  for  j  >  m.,  respectively.  These  are  bounded  by  a.K  (l+d(£  )) 

*  Jij 

and  a.K. (1+E  d (C . ) ) ,  respectively.  But  by  (A2.9),  both  (d(t.)}  and 
3  ul  3  3 

(E  d(£.):  i,j '•  j  are  uniformly  integrable,  which  implies  the  theorem. 

*1  3  1 

Q.E.D. 
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Given  the  tightness,  the  only,  further  impediment  to  the  result  of 
Theorem  1  for  the  unbounded  f(-,*)  case,  concerns  the  meaning  of  the  integrals 
in  (2.8).  A  truncation  and  limit  argument  seems  the  most  natural.  We  simply 
take  the  following  natural  approach. 

Suppose  that  there  is  a  sequence  (f^(x,£),  L  =  1,2,...,}  each  member 
of  which  satisfies  (A2.6,7),  where  c  doesn't  depend  on  L,  and  such  that 

X  X 

E  fL(x,£)  -*■  E  f(x,£)  uniformly  on  any  compact  set,  as  L  °°.  Let  (see  (A2.9)) 

jf(x,£)|f.  K(l+d(£) ) ,  and  let  fL(x,£)  =  f(x,£)  when  d(£)  L.  For  each 
m(t+T) 

T  <  “>,  let  E  £  a.d(£.)  I  (d(£.)  ^  L)  -*■  0  as  L  ”,  uniformly  in  n  <_  m(t  +T) . 

j  =n  3  J  n  - 

Then  under  (A2. 1,2, 3, 4, 5, 8),  the  conclusion  of  Theorems  1  and  3  hold.  The 

condition  of  the  next  to  last  sentence  is  guaranteed  by  (A2.9)  and  also  implies 

the  tightness. 

3.  The  Non-Markov  Case  . 

The  ideas  of  the  last  section  can  be  extended  to  some  interesting  non- 
Markov  systems  where,  loosely  speaking,  if  Xn  e  x  (a  constant)  for  all 
_■»  <  n  <  <»(  then  (£n>  is  stationary  and  has  certain  mixing  properties.  We 
next  state  some  assumptions,  which  are  modifications  of  some  in  Section  2. 

Then  a  general  convergence  theorem  is  proved.  Lastly,  it  will  be  shown  that 
the  assumptions  hold  in  many  ~ases  of  interest. 

In  particular,  in  Theorem  8  we  verify  (A3. 2,3)  when  (£n>  is  not  state 
dependent  and  satisfies  a  type  of  ^-mixing  condition.  This  case  is  of  interest, 
since  the  non-Markov  noise  and  discontinuous  dynamics  case  is  usually  hard  and 
occurs  frequently.  In  this  'non-state  dependent'  case,  the  measure  Px(£,l,‘)  *P(C*1*0 
below  would  not  depend  on  x,  and  would  equal  the  stationary  conditional  dis- 
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tribution  PUX  £  -UQ  =  £)  provided  that  this 
stationary  measure  is  weakly  continuous  in  £ . 


A3.1.  Xn  €  H,  a  compact  metric  space,  and  f(*)  is  bounded. 

The  limits  in  (A3. 2)  and  (A3. 3)  are  in  the  sense  of  probability. 
A3. 2.  For  each  x,  there  is  a  transition  function  Px(£,l,*)  which 
is  weakly  continuous  in  (x,£)  and  such  that  for  each  bounded  and  continuous 
g(-)>  with  G(x,0  =  jpx(£,l,d$’)g(5’), 

lim  [|p(d€3|Xj,Cj_1;Xu,Cw_1,  u  1  n)g(£.)  -  G(Xj,Cj_1)]  =  0. 


•<  -  '  ' 

n-M» 

A3. 3.  Define  F(x 


.0  =  |f(x. 


S')P  (C,l,d£').  Then  F ( - , - )  is  continuous 


lim  [|p(d{|j|XjI£j_1;Xu,eu_1,u  <  n)f(Xj,£j)  -  FCX.,^)]  =  0. 

A3. 4.  For  the  Markov  process  with  transition  function  Px(£,j,0  (.which  is 
obtained  recursively  from  PX(C,1,-)  as  above  (A2.4)),  there  is  a  unique  in¬ 
variant  measure  Px(0.  The  set  S  is  compact.  (Hence,  (Px(>),  x  £  H}  is  tight.) 


We  define  the  measure  Q(u,i,n,’)  similarly  to  that  in  Section  2:  i.e.,  by 


V1 


[QU.l.n.dOgm  =  j-  I  a  fp(dc  |Xu,E  _1>u  <  n)g(£  ), 


where  g(*)  is  an  arbitrary  bounded  measurable  function.  (Here,  we  use  £j  in 
the  staa  Q(-);  in  Theorem  1,  was  used.  The  choice  is  unimportant  and  is 


due  to  notational  convenience.) 
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Theorem  5.  Assume  (A3.1  to  4)  and  (A2.5,6).  Then  (xn(-)}  is  tight 
and  the  limit  x(-)  of  any  weakly  convergent  subsequence  satisfies  (2.1). 

If  (1.2)  is  used  in  lieu  of  (1.1)  and  the  conditions  above  (1.2)  hold,  with 
Xn  in  some  Euclidean  space,  then  xn(-)  is  tight  and  the  limit  of  any  weakly 
convergent  subsequence  satisfies  (2.12). 


Proof.  The  proof  is  close  to  that  of  Theorem  1  and  we  use  the  same 
terminology,  but  with  the  measure  Q(<u,£,n,0  defined  as  above  and  satisfying 
the  conditions  above  (A2.8),  and  in  the  proof  of  Theorem  1.  Since  S  is  caipact  the 

x* 

truncation  factor  1^  used  in  Theorem  1  is  not  required.  In  the  proof,  we  take  Xn  €  R  , 
Euclidean  r-space.  The  details  in  general  are  very  similar  to  the  details  in 
this  special  case.  Clearly  (xn(*) »Fn(‘)}  is  tight  in  C2r[0,°°).  Extract  a 
weakly  convergent  subsequence  (also  indexed  by  n  and  called  subsequence  1) 
with  limit  x(-),F(*)-  We  work  with  this  sequence  or  subsequences  of  it, 
henceforth.  By  the  Skorokhod  imbedding,  there  is  a  null  set  NQ  such  that  the 

limit  can  be  taken  to  be  uniform  on  bounded  intervals,  for  u>  (  N^. 

Let  g(-)  be  bounded  and  continuous. 

By  (A3. 2),  in  getting  the  limit  in  probability  (as  n  ■+  «•)  of  an  expression  of 
the  form 

vr1  »f  f 

l  jl  P(d{  ,  dX  |X  £  u<.,)8«  ) 

m.  n  J  J 


(3.1) 


£+1  a. r 

l  VPW5ilVj-i'VWuiV8(V 

m«  nd 


we  can 


substitute  G^.Cj.j)  for  Jp(dr  .  ( _1»x1|.?u-l,u  1  m)l)g(Cj) , 


when  ml+1  >  j  >.  m^.  Fix 


s.  For  each  n,  fix  m^ 


m(£,n)  such  that  s €  ft  -t  ,t  -t 

m£  n  *1+1  n 
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Under  the  Skorokhod  imbedding  E|G(X^,Cj  j)  -  G(x(s),£..  j)|  -*•  0.  Thus,  by  the 
last  three  sentences 

E|jQ((o,£,n,dO[g(C)  -  G(x(s),£)]|  4!  0. 

Choose  a  subsequence  (called  subsequence  2)  for  which 

|Q(w,t,n,d?)  [g(C)  -  G(x(s)  ,C) ]  0  w.p.l 

for  a  countable  dense  set  of  bounded  and  continuous  g(*)>  hence  for  all 

bounded  and  continuous  g(-).  Denote  the  exceptional  w-set  by  N(s).  For 

fixed  o>  t  Nq  U  N(s),  choose  a  further  subsequence  (termed  subsequence  3) 

for  which  Q(w,H,,n,-)  converges  weakly  to  some  measure  P  (•). 

w 

Then 

fpjdS)g(V  =  Jpu(dOG(x(s),0  =  facdOPx^U'l.dOgW). 

•  —  x  Tsl 

Thus,  by  uniqueness  Pu(*)=P  (•)»  a  result  which  does  not  depend  on  the 

particular  chosen  subsequence  3.  Hence  Q(a),£,n,.)  -*■  Px(s^(.)  weakly  along 
subsequence  2>  for  almost  all  w. 

Using  this  last  result  and  (A3. 3),  and  a  factorization  similar  to  the 
one  used  in  (2.8),  we  get  that 

"Vl"1  a.  , 

(3-2)  l  /  P(d£  dX.|X ,(  j.u  <  m  )f(X.,C.) 

m^  n  ;  J  •'  J  J 

-  |pX(s)(dOf(x(s),C) 

in  probability  as  n  +  «  along  subsequence  2.  Hence  (3.2)  holds  in  probability  as 
n  -*•  •  along  subsequence  1. 
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l 


We  omit  the  details  for  the  projection  algorithm.  These  use  an  adaptation 
of  the  above  method  which  is  analogous  to  the  modification  of  the  proof  of 
Theorem  1  which  is  used  in  Theorem  3.  Q.E.D. 

Remark .  Let  Xn+^  =  Xfi  +  a^f^m)  replace  (1.1),  and  suppose  that  there  is  a 
continuous  F(.)  such  that 

E[f.(io)|Xj,Cj_1;  Xu,Cu  l,u<n]  -  FfX^Cj.p  So 

as  j  -  n  -+  <*>  and  n  Then  the  Theorem  continues  to  hold. 

We  now  examine  (A3. 2, 3),  under  the  following  ^-mixing  condition,  where 
the  noise  does  not  depend  on  the  state. 

A3 . 5 .  Let  S  be  compact.  Let  ^ and  ^  denote  the  a -algebras 

which  measure  j  <^m)  and  { £ .  , ,  j  ^  m),  resp.  For  any  A  £3*™,  B  £  , 

J  J  J-1  n  n+m 

|  P(AB)  -  P(A)P(B)  |  <.  4>nP(A)  and  <_  <J>nP(B) ,  uniformly  in  m,  where  0  <_  4>n  -+  0. 


The  following  result  is  well  known. 

An+me^n+m’  Am€^>  assume  (A3. 5) . 

1  P{A  A  -  P(A  A  >1  l  0 

1  n+m1  0  n+m  ' 


iFHAjj*"  m>  -  PtAm>  |  X  0 

m1  n+m  m  1 


ElP{Am^n+m}  *  ^Vl1*  i  *nP<An +J  <  *np<*J 


Then  as  n  •+  00 


m  1  A  —  Tn'  “'n+m' 
n+m 


n  m 


EIP{A„+J^!}  -  F(A  }  1 1 .  <  4>  P(A  }  and  <  $  P{A}  . 

1  n+m1  u  m+n  1  A  —  n  n+m  —  n  m 


l 
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Eet  I  gj  |  £  1  with  gj  being  5^  measurable.  Then  as  n  °° 


*  ° 

E^gm^n+m^  '  Egm  °» 


where  the  convergences  are  uniform  in  (el.  (A  .A  >.  and  m. 
- — — —  —  -  3  m  m+n  - 


Theorem  7.  Assume  (A3. 5).  Let  g.  b£  measurable  with  |g.|  <_  1,  and 

3  3  p  3  - 

let  3  >  n-  Then  E [g -  | -^q  u  “  E f g  |  & ,]  -►  0  as  j  -  n  ■+•  ®, 

uniformly  in  j,  and  {g^}. 

Proof.  Let  Gj  ,  €  31.  ,  and  G„  The  v1  .  below  are  uniformly 

-  3-1  3-1  0,n  0  n,j 

bounded  and  E|v  .  1 I~  <_  2<j>.  P{G.  by  Lemma  6.  We  have 

3-1  3_n  3-1 


E[gJ^o  U  iJdP  =  ^G  d 

S-^Vn  ’  5  J°J-1  ' 

E[g  I  l^.jldP 
'G.^  3  b0,n  3 

-  E{g  E[I  \&\  Jl-F.jMP 

G .  .  3  °0,n  3  1  3 

3-1 

-f  *  v'  pdP 


j  EUjIJTj.,) 


(1  +v‘  )dP 

G0,n  n'J 


E [g • 1  & j  3  ]dP  +  vn  i  dP- 

G.  ,nc„.  3  31  G.  ,  n’J 


j-1  "~0,n 
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Now,  suppose  that  the  theorem  is  false.  Then  there  is  a  sequence  of  sets 

H  6  <2tn  U  ,  and  an  e  >  0  such  that  for  some  sequence  (g.)  and  j  -  n  ■+  », 
3>n  -^0  j-l  J 


(3.4) 


J.n 


£md/or  £  -e,  we  use  (3.4)  only  for  simplicity).  For  each  6  >  0,  there 


are  sets  G.  ,  €  ,  and 

j-1  j-l 


G0  n  with  {Gj-l’  i=1>2--->  disjoint  and 

P{Hct  AH.  }  <  5,  where  H?  =  (J[G*  ,  n  ].  Now,  re-do  the  calculation 
3  »n  3.n  -  3  ,n  V1  3-I  0,nJ 

(3.3)  with  Gj  ^  and  GQ  n  superscripted  by  i,  and  the  integrals  summed 
over  i.  For  small  enough  6  >  0,  this  yields  a  contradiction  to  (3.4),  since 
£  E|v3  ,.|l  .  ■>  0  as  3  -  n  -*■  «®.  Q.E.D. 


Gj-1 


Theorem  8.  Assume  (A3. 1,5).  Let  Uj)  not  depend  on  { } ;  i.e.,  for 


all  3 


P^jUu-l’V  u  i  j>  =  PidSjUu.j,  U  <  j). 

Suppose  that  there  is  a  measure  P(£,l,.)  on  Borel  sets  of  S  such  that 
P ( • , 1 . B)  is  measurable  for  each  Borel  B  c  S .  For  each  bounded,  real  valued 
and  continuous  g(-),  let 

|g(€j)P(«lCj|cJ_1  =  C)  -  |g(5')P(5,l,dC')  =  G(C)  for  all  5, 

(3.5) 

Jf(x,€j)P(d5j|cj_1  =  C)  -  Jf(x,C,)P(e,l,d5')  =  F(x,£)  for  each  x  and  g, 
where  F(-,-)  and  G(-)  are  continuous.  Then  (A3. 2, 3)  hold. 


The  proof  follows  from  Theorem  7,  by  letting 
f(x,Cj)  • 


be  either  g(£j) 


or 
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4.  Examples 

4a.  Application  to  a  Routing  Problem. 

To  illustrate  the  power  of  the  method,  we  consider  the  automata  routing 
example  described  in  [8,  Section  3].  Calls  arrive  at  a  transmitting  or  switch¬ 
ing  terminal  at  random,  at  discrete  time  instants  n  =  0,1,2,...,  with 
P{one  call  arrives  at  n^  instant}  =  y ,  y  €  (0,1),  P{  >  1  call  arrives  at 
n*"*1  instant}  =  0.  From  the  terminal,  there  are  two  possible  routings  to  the 
destination,  route  1  and  route  2;  the  i**1  route  has  hL  independent  lines 
and  can  thus  handle  up  to  NL  calls  simultaneously.  Let  [n,  n+1)  denote 
the  n^  interval  of  time.  The  duration  of  each  call  is  a  random  variable 
with  a  geometric  distribution:  P{call  completed  in  the  (n+l)st  interval | 
uncompleted  at  end  of  nth  interval,  route  i  used}  =  A^,X^  £  (0.1)-  The 
members  of  the  double  sequence  of  the  interarrival  times  and  call  durations 
are  mutually  independent.  In  [8],  the  "gain"  per  step  was  a  constant,  and  a 
detailed  study  was  made  of  the  rate  of  convergence.  Here,  we  do  a  stochastic 
approximation  version;  i.e.,  an  ■+  0.  But  the  case  where  a^  =  a  >  0  is 
handled  in  the  same  way.  Let  {y^}  denote  a  sequence  of  random  variables  with 
values  in  [0,1].  To  get  an  unambiguous  formulation,  suppose  that  calls  termi¬ 
nating  in  the  n**1  interval  actually  terminate  at  n  +  H,  and  arrivals  and  route 

1  2 

assignments  are  at  the  instants  0,1,2 .  Define  £  =(£,£)=  route 

n  n  n 

occupancy  process  (called  in  [8]),  where  £*  =  number  of  lines  of  route  i 

occupied  at  time  n+.  If  a  call  arrives  at  instant  n+1,  the  automaton  "flips 
a  coin",  choosing  route  1  with  probability  yn  and  route  2  with  probability 
(1-y^).  If  all  lines  of  the  chosen  route  i  are  occupied  at  instant  (n+1)  , 
then  the  call  is  switched  to  route  j(j^i).  If  all  lines  of  route  j  are  also 
occupied  at  instant  (n+1)',  then  the  call  is  rejected,  and  disappears  from  the 
system.  The  model  can  be  generalized  considerably,  both  in  the  number  of  lines 
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and  switching  nodes,  and  in  the  input  and  call  length  statistics.  Let  J 

in 

denote  the  indicator  of  the  event  {call  arrives  at  n  +  1,  is  assigned  first  to 
route  i  and  is  accepted  by  route  i} .  The  algorithm  is  (4.1),  where 

Q 

0  <  a  <  8  <  1  are  truncation  points,  and  yn  €  (a, 6).  The  bar  |  denotes  truncation 

U  0 


(4.1) 


n+1 


[y„ 


+  ana-yn>Jm  - 


VnJ2n]la- 


Here,  ~  £'|yn  =  =  5)  is  a  continuous  function  of  (y,£).  The 

Markov  chain  is  {y^O  not  {X^S^}.  For  each  fixed  y  £  [a, 6],  {^(y), 

n  ^  0}  has  a  unique  invariant  measure  Py(*),  and  E[J  ) y  ,£  S.  £  n]  = 
Fi(yn,Cn) ,  where  F^(*,*)  is  a  continuous  function  of  y  for  each  (discrete)C. 
Define  yn(-)  as  xn(-)  was  defined.  By  Theorem  1  or  3  and  the  extension  cited 
after  the  Theorem  statement  we  immediately  get  the  correct  ODE  (which  must  be 
satisfied  by  all  the  weak  limits  of  (yn(*))) 

(4.2)  y  =  [(l-y)EyJln  -  yEXJ2n]  for  y  £  (a,B), 

y(*)  stops  on  first  hitting  a  or  6. 


Simple!  No  analysis  of  rates  of  convergence  of  n-step  transition  functions,  etc. 
is  required.  Also,  no  analysis  of  the  x-dependence  of  the  {£n>  or  {£n(x)}  is 
required.  The  model  upon  which  the  analysis  is  [8]  was  based  appeared  in  [9]. 

4b.  An  adaptive  quantizer.  Efficient  quantization  of  signals  in  telecommunica¬ 
tions  systems  is  of  considerable  current  interest  (e.g.,  of  voice  signals  in 
telephone  systems).  Let  the  signal  process  z(-)  be  sampled  at  instants  nA, 
n  =  0,1,...,  and  let  the  samples  (z(nA)}  be  quantized  and  then  transmitted. 
Adaptive  quantizers  have  been  studied  as  a  means  to  more  efficient  quantization 
the  quantization  scale  for  'large'  signals,  should  be  different  from  that  for 
'small'  signals  An  adaptive  quantizer  studied  in  [10,11]  takes  roughly  the 
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following  form.  We  use  =  e,  a  constant.  Let  0  =  <  ip ^  <  •  •  •  <  ^  =  “• 

0  =  Oj  <  ^2  <  nL>  where  the  are  real  numbers.  For  a  scaling 

parameter  y  >  0,  define  the  quantization  function  q(-).  For  z(nA)  >  0,  set 


q(z(nA))  =  yn^  if  z(nA)  £  fynp ^ j » y^ ^ )  and  set  q(-z)  =  -q(z).  The  parameter 
y  should  vary  with  the  signal  power.  To  get  the  adaptive  quantizer  of  concern. 


fix  real  numbers  0  <  ME  <  ME  < . . .  <  m|"  <  ® ,  where  ME  <  1,  >  1,  ar.d  set 

6  €  (0,1].  Let  0  <  y^  <  yu  <  “ .  Then  we  adapt  the  scale  y  according  to 


(yn)6Bn 

n  n 


where  BE  =  ME  if  |z(nA)|  €  [y^i  l>  yEi|>.) 


We  do  an  asymptotic  analysis  of  the  sequence  yE(-),  defined  as  the  piecewise 
linear  interpolation  of  the  function  with  values  yE  at  time  ne.  Let 

yo  =  yo  e  [yryu]  • 

Now  define  <• .  •<  j^,  <  0,  ^  >  0,  and  a  >  0  such  that 

ea  <  1.  Then  set  ME  =  (1+et^),  B  =  1  -  ea.  Then  using  y1  Ea  =  y[l-ea  log  y]  +  0(e2) 

and  (1+eb2)  *  BE, 
n  n 

(4-4)  y„+1  =  [y„(1+EbE)  -  ea  yElog  yE  +  0(e2)] 

=  [y„  +  eF(yE,z(nA))  +  0(e2)] 

Assume  further  that  Z(-)  is  a  stationary  (finite  order)  Gauss  Markov 
process  with  Cov  Z(t)  >  0  and  let  z(t)  =  h'Z(t),  for  some  vector  h  /  0.  In 
this  example,  the  noise  does  not  depend  on  the  state  and  so  the  analysis  is  quite 
simple,  even  though  z(-)  is  not  a  bounded  process.  Define  EF(y,z(0))  =  F(y) . 

Then  F(y)  has  a  unique  zero  y  on  (0,®),  and  F(y)  is  positive  for  y  <  y  and 
negative  for  y  >  y  [8,  Section  7].  In  [8,  Sections  7  to  9] ,  there  is  a  detailed 
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investigation  of  the  limit  of  [y^~y]//e.  Here,  we  are  only  concerned  with 
the  simpler  question  of  the  limit  of  ye(0*  For  some  c  0, 

E  .Ffy.zfnA+A+cA))  is  continuous  in  Z(0),y  and  tends  to  F(y)  in  the  mean, 
uniformly  in  y  €  [yj»yu]*  as  c  -*•  ®.  This  fact  and  the  method  of  proof  of 
Theorem  1  or  of  Theorem  3  and  the  extension  cited  after  the  theorems  implies 
immediately  that  the  weak  limit  of  {y e ( - ) >  satisfies  (4.5). 

(4.5)  y  =  F(y),  y(0)  =  yQ  if  y  €  [y^yj, 

and  if  y?  [yA>yu]>  y(0  stops  on  first  hitting  y^  or  yu> 


/ 

i 
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